Goto

Collaborating Authors

 distance-generating function


On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

Neural Information Processing Systems

First-order methods (FOMs) are arguably the most scalable algorithms for equilibrium computation in large extensive-form games. To operationalize these methods, a distance-generating function, acting as a regularizer for the strategy space, must be chosen. The ratio between the strong convexity modulus and the diameter of the regularizer is a key parameter in the analysis of FOMs.A natural question is then: what is the optimal distance-generating function for extensive-form decision spaces? In this paper, we make a number of contributions, ultimately establishing that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors. The DilEnt regularizer is notable due to its iterate-equivalence with Kernelized OMWU (KOMWU)---the algorithm with state-of-the-art dependence on the game tree size in extensive-form games---when used in conjunction with the online mirror descent (OMD) algorithm. However, the standard analysis for OMD is unable to establish such a result; the only current analysis is by appealing to the iterate equivalence to KOMWU. We close this gap by introducing a pair of primal-dual treeplex norms, which we contend form the natural analytic viewpoint for studying the strong convexity of DilEnt. Using these norm pairs, we recover the diameter-to-strong-convexity ratio that predicts the same performance as KOMWU. Along with a new regret lower bound for online learning in sequence-form strategy spaces, we show that this ratio is nearly optimal.Finally, we showcase our analytic techniques by refining the analysis of Clairvoyant OMD when paired with DilEnt, establishing an $\mathcal{O}(n \log |\mathcal{V}| \log T/T)$ approximation rate to coarse correlated equilibrium in $n$-player games, where $|\mathcal{V}|$ is the number of reduced normal-form strategies of the players, establishing the new state of the art.



On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

Neural Information Processing Systems

First-order methods (FOMs) are arguably the most scalable algorithms for equilibrium computation in large extensive-form games. To operationalize these methods, a distance-generating function, acting as a regularizer for the strategy space, must be chosen. The ratio between the strong convexity modulus and the diameter of the regularizer is a key parameter in the analysis of FOMs.A natural question is then: what is the optimal distance-generating function for extensive-form decision spaces? In this paper, we make a number of contributions, ultimately establishing that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors. The DilEnt regularizer is notable due to its iterate-equivalence with Kernelized OMWU (KOMWU)---the algorithm with state-of-the-art dependence on the game tree size in extensive-form games---when used in conjunction with the online mirror descent (OMD) algorithm.


Federated Composite Saddle Point Optimization

Bai, Site, Bullins, Brian

arXiv.org Artificial Intelligence

Federated learning (FL) approaches for saddle point problems (SPP) have recently gained in popularity due to the critical role they play in machine learning (ML). Existing works mostly target smooth unconstrained objectives in Euclidean space, whereas ML problems often involve constraints or non-smooth regularization, which results in a need for composite optimization. Addressing these issues, we propose Federated Dual Extrapolation (FeDualEx), an extra-step primal-dual algorithm, which is the first of its kind that encompasses both saddle point optimization and composite objectives under the FL paradigm. Both the convergence analysis and the empirical evaluation demonstrate the effectiveness of FeDualEx in these challenging settings. In addition, even for the sequential version of FeDualEx, we provide rates for the stochastic composite saddle point setting which, to our knowledge, are not found in prior literature.


Mirror descent in saddle-point problems: Going the extra (gradient) mile

Mertikopoulos, Panayotis, Zenati, Houssam, Lecouat, Bruno, Foo, Chuan-Sheng, Chandrasekhar, Vijay, Piliouras, Georgios

arXiv.org Machine Learning

Owing to their connection with generative adversarial networks (GANs), saddle-point problems have recently attracted considerable interest in machine learning and beyond. By necessity, most theoretical guarantees revolve around convex-concave problems; however, making theoretical inroads towards efficient GAN training crucially depends on moving beyond this classic framework. To make piecemeal progress along these lines, we analyze the widely used mirror descent (MD) method in a class of non-monotone problems - called coherent - whose solutions coincide with those of a naturally associated variational inequality. Our first result is that, under strict coherence (a condition satisfied by all strictly convex-concave problems), MD methods converge globally; however, they may fail to converge even in simple, bilinear models. To mitigate this deficiency, we add on an "extra-gradient" step which we show stabilizes MD methods by looking ahead and using a "future gradient". These theoretical results are subsequently validated by numerical experiments in GANs.


Sparse Non Gaussian Component Analysis by Semidefinite Programming

Diederichs, Elmar, Juditsky, Anatoli, Nemirovski, Arkadi, Spokoiny, Vladimir

arXiv.org Machine Learning

Sparse non-Gaussian component analysis (SNGCA) is an unsupervised method of extracting a linear structure from a high dimensional data based on estimating a low-dimensional non-Gaussian data component. In this paper we discuss a new approach to direct estimation of the projector on the target space based on semidefinite programming which improves the method sensitivity to a broad variety of deviations from normality. We also discuss the procedures which allows to recover the structure when its effective dimension is unknown.